32 research outputs found
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Aerial scene recognition is a fundamental task in remote sensing and has
recently received increased interest. While the visual information from
overhead images with powerful models and efficient algorithms yields
considerable performance on scene recognition, it still suffers from the
variation of ground objects, lighting conditions etc. Inspired by the
multi-channel perception theory in cognition science, in this paper, for
improving the performance on the aerial scene recognition, we explore a novel
audiovisual aerial scene recognition task using both images and sounds as
input. Based on an observation that some specific sound events are more likely
to be heard at a given geographic location, we propose to exploit the knowledge
from the sound events to improve the performance on the aerial scene
recognition. For this purpose, we have constructed a new dataset named AuDio
Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this
dataset, we evaluate three proposed approaches for transferring the sound event
knowledge to the aerial scene recognition task in a multimodal learning
framework, and show the benefit of exploiting the audio information for the
aerial scene recognition. The source code is publicly available for
reproducibility purposes.Comment: ECCV 202
Gazedirector: Fully articulated eye gaze redirection in video
We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior
Recommended from our members
Learning an appearance-based gaze estimator from one million synthesised images
Investigating non-classical correlations between decision fused multi-modal documents
Correlation has been widely used to facilitate various information retrieval methods such as query expansion, relevance feedback, document clustering, and multi-modal fusion. Especially, correlation and independence are important issues when fusing different modalities that influence a multi-modal information retrieval process. The basic idea of correlation is that an observable can help predict or enhance another observable. In quantum mechanics, quantum correlation, called entanglement, is a sort of correlation between the observables measured in atomic-size particles when these particles are not necessarily collected in ensembles. In this paper, we examine a multimodal fusion scenario that might be similar to that encountered in physics by firstly measuring two observables (i.e., text-based relevance and image-based relevance) of a multi-modal document without counting on an ensemble of multi-modal documents already labeled in terms of these two variables. Then, we investigate the existence of non-classical correlations between pairs of multi-modal documents. Despite there are some basic differences between entanglement and classical correlation encountered in the macroscopic world, we investigate the existence of this kind of non-classical correlation through the Bell inequality violation. Here, we experimentally test several novel association methods in a small-scale experiment. However, in the current experiment we did not find any violation of the Bell inequality. Finally, we present a series of interesting discussions, which may provide theoretical and empirical insights and inspirations for future development of this direction
Towards a Taxonomy of Cognitive RPA Components
Robotic Process Automation (RPA) is a discipline that is
increasingly growing hand in hand with Artificial Intelligence (AI) and
Machine Learning enabling the so-called cognitive automation. In such
context, the existing RPA platforms that include AI-based solutions clas sify their components, i.e. constituting part of a robot that performs a
set of actions, in a way that seems to obey market or business deci sions instead of common-sense rules. To be more precise, components
that present similar functionality are identified with different names and
grouped in different ways depending on the platform that provides the
components. Therefore, the analysis of different cognitive RPA platforms
to check their suitability for facing a specific need is typically a time consuming and error-prone task. To overcome this problem and to pro vide users with support in the development of an RPA project, this
paper proposes a method for the systematic construction of a taxonomy
of cognitive RPA components. Moreover, such a method is applied over
components that solve selected real-world use cases from the industry
obtaining promising resultsMinisterio de Economía y Competitividad TIN2016-76956-C3-2-RJunta de Andalucía CEI-12-TIC021Centro para el Desarrollo Tecnológico Industrial P011-19/E0
Learning Tversky Similarity
In this paper, we advocate Tversky's ratio model as an appropriate basis for
computational approaches to semantic similarity, that is, the comparison of
objects such as images in a semantically meaningful way. We consider the
problem of learning Tversky similarity measures from suitable training data
indicating whether two objects tend to be similar or dissimilar.
Experimentally, we evaluate our approach to similarity learning on two image
datasets, showing that is performs very well compared to existing methods
Detecting human Activities Based on a multimodal sensor data set using a bidirectional long short-term memory model: a case study
Human falls are one of the leading causes of fatal unintentional injuries
worldwide. Falls result in a direct financial cost to health systems, and indirectly,
to society’s productivity. Unsurprisingly, human fall detection and prevention is
a major focus of health research. In this chapter, we present and evaluate several
bidirectional long short-term memory (Bi-LSTM) models using a data set provided
by the Challenge UP competition. The main goal of this study is to detect 12 human
daily activities (six daily human activities, five falls, and one post-fall activity)
derived from multi-modal data sources - wearable sensors, ambient sensors, and
vision devices. Our proposed Bi-LSTM model leverages data from accelerometer
and gyroscope sensors located at the ankle, right pocket, belt, and neck of the subject.
We utilize a grid search technique to evaluate variations of the Bi-LSTM model and
identify a configuration that presents the best results. The best Bi-LSTM model
achieved good results for precision and f1-score, 43.30% and 38.50%, respectivel